Study on the effects of intrinsic variation using i-vectors in text-independent speaker verification
نویسندگان
چکیده
Speaker verification performance is adversely affected by mismatches between training and testing data in intrinsic variations. This paper explores how recent technologies focused on modeling the total variability behave in addressing the effects of intrinsic variation in speaker verification. The effects of intrinsic variation are investigated from six aspects including speaking style, speaking rate, speaking volume, emotional state, physical status, and speaking language. The speaker and session variability are modeled with the i-vector framework in the total variability space and the cosine similarity is used as the final decision score in the i-vector based speaker verification system. Intrinsic variations are compensated in the i-vector framework with a variety of techniques, specifically Linear Discriminant Analysis (LDA), Within-Class Covariance Normalization (WCCN) and Nuisance Attribute Projection (NAP). Experiments in the intrinsic corpus show that speaker volume has dramatic effects on the results of speaker verification systems and whisper speech brings the largest degradation of speaker verification performance. The best results are obtained by i-vector modeling with the combined compensation of LDA andWCCN in the i-vector based systems. Compared to the GMM-UBM based system, around 36.76% relative improvement in Equal Error Rate (EER) is obtained in the i-Vector+LDA+WCCN system.
منابع مشابه
CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances
Text-independent speaker recognition using short utterances is a highly challenging task due to the large variation and content mismatch between short utterances. I-vector and probabilistic linear discriminant analysis (PLDA) based systems have become the standard in speaker verification applications, but they are less effective with short utterances. To address this issue, we propose a novel m...
متن کاملDeep Speaker Vectors for Semi Text-independent Speaker Verification
Recent research shows that deep neural networks (DNNs) can be used to extract deep speaker vectors (d-vectors) that preserve speaker characteristics and can be used in speaker verification. This new method has been tested on text-dependent speaker verification tasks, and improvement was reported when combined with the conventional i-vector method. This paper extends the d-vector approach to sem...
متن کاملDNN i-Vector Speaker Verification with Short, Text-Constrained Test Utterances
We investigate how to improve the performance of DNN ivector based speaker verification for short, text-constrained test utterances, e.g. connected digit strings. A text-constrained verification, due to its smaller, limited vocabulary, can deliver better performance than a text-independent one for a short utterance. We study the problem with “phonetically aware” Deep Neural Net (DNN) in its cap...
متن کاملEffects of vocal effort and speaking style on text-independent speaker verification
We study the question of how intrinsic variations (associated with the speaker rather than the recording environment) affect text-independent speaker verification performance. Experiments using the SRI-FRTIV corpus, which systematically varies both vocal effort and speaking style, reveal that (1) “furtive” speech poses a significant challenge; (2) conversations and interviews, despite stylistic...
متن کاملDeep Neural Network Embeddings for Text-Independent Speaker Verification
This paper investigates replacing i-vectors for text-independent speaker verification with embeddings extracted from a feedforward deep neural network. Long-term speaker characteristics are captured in the network by a temporal pooling layer that aggregates over the input speech. This enables the network to be trained to discriminate between speakers from variablelength speech segments. After t...
متن کامل